Overview

Dataset statistics

Number of variables12
Number of observations1460
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory102.8 KiB
Average record size in memory72.1 B

Variable types

Numeric8
Categorical4

Warnings

TotalBsmtSF is highly correlated with 1stFlrSF and 1 other fieldsHigh correlation
1stFlrSF is highly correlated with TotalBsmtSF and 2 other fieldsHigh correlation
GrLivArea is highly correlated with 1stFlrSF and 2 other fieldsHigh correlation
TotRmsAbvGrd is highly correlated with GrLivArea and 1 other fieldsHigh correlation
GarageArea is highly correlated with SalePriceHigh correlation
SalePrice is highly correlated with TotalBsmtSF and 4 other fieldsHigh correlation
TotalBsmtSF is highly correlated with 1stFlrSF and 1 other fieldsHigh correlation
1stFlrSF is highly correlated with TotalBsmtSF and 1 other fieldsHigh correlation
GrLivArea is highly correlated with TotRmsAbvGrd and 1 other fieldsHigh correlation
TotRmsAbvGrd is highly correlated with GrLivArea and 1 other fieldsHigh correlation
GarageArea is highly correlated with SalePriceHigh correlation
SalePrice is highly correlated with TotalBsmtSF and 4 other fieldsHigh correlation
TotalBsmtSF is highly correlated with 1stFlrSFHigh correlation
1stFlrSF is highly correlated with TotalBsmtSFHigh correlation
GrLivArea is highly correlated with SalePriceHigh correlation
SalePrice is highly correlated with GrLivAreaHigh correlation
TotalBsmtSF is highly correlated with GarageArea and 4 other fieldsHigh correlation
GarageArea is highly correlated with TotalBsmtSF and 7 other fieldsHigh correlation
HouseStyle is highly correlated with TotRmsAbvGrd and 2 other fieldsHigh correlation
SalePrice is highly correlated with TotalBsmtSF and 9 other fieldsHigh correlation
FullBath is highly correlated with SalePrice and 6 other fieldsHigh correlation
GarageCars is highly correlated with GarageArea and 6 other fieldsHigh correlation
TotRmsAbvGrd is highly correlated with HouseStyle and 5 other fieldsHigh correlation
GrLivArea is highly correlated with TotalBsmtSF and 9 other fieldsHigh correlation
Neighborhood is highly correlated with GarageArea and 8 other fieldsHigh correlation
YearBuilt is highly correlated with GarageArea and 7 other fieldsHigh correlation
1stFlrSF is highly correlated with TotalBsmtSF and 5 other fieldsHigh correlation
OverallQual is highly correlated with TotalBsmtSF and 8 other fieldsHigh correlation
TotalBsmtSF has 37 (2.5%) zeros Zeros
GarageArea has 81 (5.5%) zeros Zeros

Reproduction

Analysis started2021-07-31 00:51:14.484215
Analysis finished2021-07-31 00:51:20.862386
Duration6.38 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

YearBuilt
Real number (ℝ≥0)

HIGH CORRELATION

Distinct112
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1971.267808
Minimum1872
Maximum2010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:20.931202image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1872
5-th percentile1916
Q11954
median1973
Q32000
95-th percentile2007
Maximum2010
Range138
Interquartile range (IQR)46

Descriptive statistics

Standard deviation30.20290404
Coefficient of variation (CV)0.01532156307
Kurtosis-0.4395519416
Mean1971.267808
Median Absolute Deviation (MAD)25
Skewness-0.6134611725
Sum2878051
Variance912.2154126
MonotonicityNot monotonic
2021-07-30T18:51:21.028941image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200667
 
4.6%
200564
 
4.4%
200454
 
3.7%
200749
 
3.4%
200345
 
3.1%
197633
 
2.3%
197732
 
2.2%
192030
 
2.1%
195926
 
1.8%
199825
 
1.7%
Other values (102)1035
70.9%
ValueCountFrequency (%)
18721
 
0.1%
18751
 
0.1%
18804
 
0.3%
18821
 
0.1%
18852
 
0.1%
18902
 
0.1%
18922
 
0.1%
18931
 
0.1%
18981
 
0.1%
190010
0.7%
ValueCountFrequency (%)
20101
 
0.1%
200918
 
1.2%
200823
 
1.6%
200749
3.4%
200667
4.6%
200564
4.4%
200454
3.7%
200345
3.1%
200223
 
1.6%
200120
 
1.4%

Neighborhood
Categorical

HIGH CORRELATION

Distinct25
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size11.5 KiB
NAmes
225 
CollgCr
150 
OldTown
113 
Edwards
100 
Somerst
86 
Other values (20)
786 

Length

Max length7
Median length7
Mean length6.494520548
Min length5

Characters and Unicode

Total characters9482
Distinct characters38
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCollgCr
2nd rowVeenker
3rd rowCollgCr
4th rowCrawfor
5th rowNoRidge

Common Values

ValueCountFrequency (%)
NAmes225
15.4%
CollgCr150
 
10.3%
OldTown113
 
7.7%
Edwards100
 
6.8%
Somerst86
 
5.9%
Gilbert79
 
5.4%
NridgHt77
 
5.3%
Sawyer74
 
5.1%
NWAmes73
 
5.0%
SawyerW59
 
4.0%
Other values (15)424
29.0%

Length

2021-07-30T18:51:21.211452image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
names225
15.4%
collgcr150
 
10.3%
oldtown113
 
7.7%
edwards100
 
6.8%
somerst86
 
5.9%
gilbert79
 
5.4%
nridght77
 
5.3%
sawyer74
 
5.1%
nwames73
 
5.0%
sawyerw59
 
4.0%
Other values (15)424
29.0%

Most occurring characters

ValueCountFrequency (%)
r931
 
9.8%
e905
 
9.5%
l622
 
6.6%
d506
 
5.3%
s486
 
5.1%
o483
 
5.1%
m439
 
4.6%
N425
 
4.5%
w414
 
4.4%
C407
 
4.3%
Other values (28)3864
40.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter6764
71.3%
Uppercase Letter2718
28.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r931
13.8%
e905
13.4%
l622
9.2%
d506
 
7.5%
s486
 
7.2%
o483
 
7.1%
m439
 
6.5%
w414
 
6.1%
i351
 
5.2%
a345
 
5.1%
Other values (10)1282
19.0%
Uppercase Letter
ValueCountFrequency (%)
N425
15.6%
C407
15.0%
S352
13.0%
A298
11.0%
T188
6.9%
W157
 
5.8%
O150
 
5.5%
B118
 
4.3%
R115
 
4.2%
E100
 
3.7%
Other values (8)408
15.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9482
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
r931
 
9.8%
e905
 
9.5%
l622
 
6.6%
d506
 
5.3%
s486
 
5.1%
o483
 
5.1%
m439
 
4.6%
N425
 
4.5%
w414
 
4.4%
C407
 
4.3%
Other values (28)3864
40.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII9482
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r931
 
9.8%
e905
 
9.5%
l622
 
6.6%
d506
 
5.3%
s486
 
5.1%
o483
 
5.1%
m439
 
4.6%
N425
 
4.5%
w414
 
4.4%
C407
 
4.3%
Other values (28)3864
40.8%

FullBath
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2
768 
1
650 
3
 
33
0
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1460
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row1
5th row2

Common Values

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Length

2021-07-30T18:51:21.375016image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-30T18:51:21.422889image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common1460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

HouseStyle
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size11.5 KiB
1Story
726 
2Story
445 
1.5Fin
154 
SLvl
 
65
SFoyer
 
37
Other values (3)
 
33

Length

Max length6
Median length6
Mean length5.910958904
Min length4

Characters and Unicode

Total characters8630
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2Story
2nd row1Story
3rd row2Story
4th row2Story
5th row2Story

Common Values

ValueCountFrequency (%)
1Story726
49.7%
2Story445
30.5%
1.5Fin154
 
10.5%
SLvl65
 
4.5%
SFoyer37
 
2.5%
1.5Unf14
 
1.0%
2.5Unf11
 
0.8%
2.5Fin8
 
0.5%

Length

2021-07-30T18:51:21.566518image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-30T18:51:21.627341image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
1story726
49.7%
2story445
30.5%
1.5fin154
 
10.5%
slvl65
 
4.5%
sfoyer37
 
2.5%
1.5unf14
 
1.0%
2.5unf11
 
0.8%
2.5fin8
 
0.5%

Most occurring characters

ValueCountFrequency (%)
S1273
14.8%
o1208
14.0%
r1208
14.0%
y1208
14.0%
t1171
13.6%
1894
10.4%
2464
 
5.4%
F199
 
2.3%
.187
 
2.2%
5187
 
2.2%
Other values (8)631
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5336
61.8%
Uppercase Letter1562
 
18.1%
Decimal Number1545
 
17.9%
Other Punctuation187
 
2.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o1208
22.6%
r1208
22.6%
y1208
22.6%
t1171
21.9%
n187
 
3.5%
i162
 
3.0%
v65
 
1.2%
l65
 
1.2%
e37
 
0.7%
f25
 
0.5%
Uppercase Letter
ValueCountFrequency (%)
S1273
81.5%
F199
 
12.7%
L65
 
4.2%
U25
 
1.6%
Decimal Number
ValueCountFrequency (%)
1894
57.9%
2464
30.0%
5187
 
12.1%
Other Punctuation
ValueCountFrequency (%)
.187
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6898
79.9%
Common1732
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
S1273
18.5%
o1208
17.5%
r1208
17.5%
y1208
17.5%
t1171
17.0%
F199
 
2.9%
n187
 
2.7%
i162
 
2.3%
L65
 
0.9%
v65
 
0.9%
Other values (4)152
 
2.2%
Common
ValueCountFrequency (%)
1894
51.6%
2464
26.8%
.187
 
10.8%
5187
 
10.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII8630
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S1273
14.8%
o1208
14.0%
r1208
14.0%
y1208
14.0%
t1171
13.6%
1894
10.4%
2464
 
5.4%
F199
 
2.3%
.187
 
2.2%
5187
 
2.2%
Other values (8)631
7.3%

OverallQual
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.099315068
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:21.696966image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q15
median6
Q37
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.382996547
Coefficient of variation (CV)0.2267462053
Kurtosis0.09629277836
Mean6.099315068
Median Absolute Deviation (MAD)1
Skewness0.2169439278
Sum8905
Variance1.912679448
MonotonicityNot monotonic
2021-07-30T18:51:21.770281image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
5397
27.2%
6374
25.6%
7319
21.8%
8168
11.5%
4116
 
7.9%
943
 
2.9%
320
 
1.4%
1018
 
1.2%
23
 
0.2%
12
 
0.1%
ValueCountFrequency (%)
12
 
0.1%
23
 
0.2%
320
 
1.4%
4116
 
7.9%
5397
27.2%
6374
25.6%
7319
21.8%
8168
11.5%
943
 
2.9%
1018
 
1.2%
ValueCountFrequency (%)
1018
 
1.2%
943
 
2.9%
8168
11.5%
7319
21.8%
6374
25.6%
5397
27.2%
4116
 
7.9%
320
 
1.4%
23
 
0.2%
12
 
0.1%

TotalBsmtSF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct721
Distinct (%)49.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1057.429452
Minimum0
Maximum6110
Zeros37
Zeros (%)2.5%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:21.856631image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile519.3
Q1795.75
median991.5
Q31298.25
95-th percentile1753
Maximum6110
Range6110
Interquartile range (IQR)502.5

Descriptive statistics

Standard deviation438.7053245
Coefficient of variation (CV)0.4148790481
Kurtosis13.25048328
Mean1057.429452
Median Absolute Deviation (MAD)234.5
Skewness1.524254549
Sum1543847
Variance192462.3617
MonotonicityNot monotonic
2021-07-30T18:51:21.948387image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
037
 
2.5%
86435
 
2.4%
67217
 
1.2%
91215
 
1.0%
104014
 
1.0%
81613
 
0.9%
76812
 
0.8%
72812
 
0.8%
89411
 
0.8%
78011
 
0.8%
Other values (711)1283
87.9%
ValueCountFrequency (%)
037
2.5%
1051
 
0.1%
1901
 
0.1%
2643
 
0.2%
2701
 
0.1%
2901
 
0.1%
3191
 
0.1%
3601
 
0.1%
3721
 
0.1%
3847
 
0.5%
ValueCountFrequency (%)
61101
0.1%
32061
0.1%
32001
0.1%
31381
0.1%
30941
0.1%
26331
0.1%
25241
0.1%
24441
0.1%
23961
0.1%
23921
0.1%

1stFlrSF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct753
Distinct (%)51.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1162.626712
Minimum334
Maximum4692
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:22.046772image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile672.95
Q1882
median1087
Q31391.25
95-th percentile1831.25
Maximum4692
Range4358
Interquartile range (IQR)509.25

Descriptive statistics

Standard deviation386.587738
Coefficient of variation (CV)0.3325123481
Kurtosis5.745841482
Mean1162.626712
Median Absolute Deviation (MAD)234.5
Skewness1.376756622
Sum1697435
Variance149450.0792
MonotonicityNot monotonic
2021-07-30T18:51:22.140986image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86425
 
1.7%
104016
 
1.1%
91214
 
1.0%
89412
 
0.8%
84812
 
0.8%
67211
 
0.8%
6309
 
0.6%
8169
 
0.6%
4837
 
0.5%
9607
 
0.5%
Other values (743)1338
91.6%
ValueCountFrequency (%)
3341
 
0.1%
3721
 
0.1%
4381
 
0.1%
4801
 
0.1%
4837
0.5%
4951
 
0.1%
5205
0.3%
5251
 
0.1%
5261
 
0.1%
5361
 
0.1%
ValueCountFrequency (%)
46921
0.1%
32281
0.1%
31381
0.1%
28981
0.1%
26331
0.1%
25241
0.1%
25151
0.1%
24441
0.1%
24111
0.1%
24021
0.1%

GrLivArea
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct861
Distinct (%)59.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1515.463699
Minimum334
Maximum5642
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:22.231744image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile848
Q11129.5
median1464
Q31776.75
95-th percentile2466.1
Maximum5642
Range5308
Interquartile range (IQR)647.25

Descriptive statistics

Standard deviation525.4803834
Coefficient of variation (CV)0.3467456092
Kurtosis4.895120581
Mean1515.463699
Median Absolute Deviation (MAD)326
Skewness1.366560356
Sum2212577
Variance276129.6334
MonotonicityNot monotonic
2021-07-30T18:51:22.329483image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86422
 
1.5%
104014
 
1.0%
89411
 
0.8%
145610
 
0.7%
84810
 
0.7%
12009
 
0.6%
9129
 
0.6%
8168
 
0.5%
10928
 
0.5%
17287
 
0.5%
Other values (851)1352
92.6%
ValueCountFrequency (%)
3341
 
0.1%
4381
 
0.1%
4801
 
0.1%
5201
 
0.1%
6051
 
0.1%
6161
 
0.1%
6306
0.4%
6722
 
0.1%
6911
 
0.1%
6931
 
0.1%
ValueCountFrequency (%)
56421
0.1%
46761
0.1%
44761
0.1%
43161
0.1%
36271
0.1%
36081
0.1%
34931
0.1%
34471
0.1%
33951
0.1%
32791
0.1%

TotRmsAbvGrd
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.517808219
Minimum2
Maximum14
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:22.404969image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q15
median6
Q37
95-th percentile10
Maximum14
Range12
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.625393291
Coefficient of variation (CV)0.2493772808
Kurtosis0.8807615657
Mean6.517808219
Median Absolute Deviation (MAD)1
Skewness0.6763408364
Sum9516
Variance2.641903349
MonotonicityNot monotonic
2021-07-30T18:51:22.481378image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
6402
27.5%
7329
22.5%
5275
18.8%
8187
12.8%
497
 
6.6%
975
 
5.1%
1047
 
3.2%
1118
 
1.2%
317
 
1.2%
1211
 
0.8%
Other values (2)2
 
0.1%
ValueCountFrequency (%)
21
 
0.1%
317
 
1.2%
497
 
6.6%
5275
18.8%
6402
27.5%
7329
22.5%
8187
12.8%
975
 
5.1%
1047
 
3.2%
1118
 
1.2%
ValueCountFrequency (%)
141
 
0.1%
1211
 
0.8%
1118
 
1.2%
1047
 
3.2%
975
 
5.1%
8187
12.8%
7329
22.5%
6402
27.5%
5275
18.8%
497
 
6.6%

GarageArea
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct441
Distinct (%)30.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean472.980137
Minimum0
Maximum1418
Zeros81
Zeros (%)5.5%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:22.568391image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1334.5
median480
Q3576
95-th percentile850.1
Maximum1418
Range1418
Interquartile range (IQR)241.5

Descriptive statistics

Standard deviation213.8048415
Coefficient of variation (CV)0.452037675
Kurtosis0.9170672023
Mean472.980137
Median Absolute Deviation (MAD)120
Skewness0.1799809067
Sum690551
Variance45712.51023
MonotonicityNot monotonic
2021-07-30T18:51:22.666020image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
081
 
5.5%
44049
 
3.4%
57647
 
3.2%
24038
 
2.6%
48434
 
2.3%
52833
 
2.3%
28827
 
1.8%
40025
 
1.7%
26424
 
1.6%
48024
 
1.6%
Other values (431)1078
73.8%
ValueCountFrequency (%)
081
5.5%
1602
 
0.1%
1641
 
0.1%
1809
 
0.6%
1861
 
0.1%
1891
 
0.1%
1921
 
0.1%
1981
 
0.1%
2004
 
0.3%
2053
 
0.2%
ValueCountFrequency (%)
14181
0.1%
13901
0.1%
13561
0.1%
12481
0.1%
12201
0.1%
11661
0.1%
11341
0.1%
10691
0.1%
10531
0.1%
10522
0.1%

SalePrice
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct663
Distinct (%)45.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180921.1959
Minimum34900
Maximum755000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size11.5 KiB
2021-07-30T18:51:22.766271image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum34900
5-th percentile88000
Q1129975
median163000
Q3214000
95-th percentile326100
Maximum755000
Range720100
Interquartile range (IQR)84025

Descriptive statistics

Standard deviation79442.50288
Coefficient of variation (CV)0.4391000319
Kurtosis6.53628186
Mean180921.1959
Median Absolute Deviation (MAD)38000
Skewness1.88287576
Sum264144946
Variance6311111264
MonotonicityNot monotonic
2021-07-30T18:51:22.873985image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14000020
 
1.4%
13500017
 
1.2%
15500014
 
1.0%
14500014
 
1.0%
19000013
 
0.9%
11000013
 
0.9%
11500012
 
0.8%
16000012
 
0.8%
13000011
 
0.8%
13900011
 
0.8%
Other values (653)1323
90.6%
ValueCountFrequency (%)
349001
0.1%
353111
0.1%
379001
0.1%
393001
0.1%
400001
0.1%
520001
0.1%
525001
0.1%
550002
0.1%
559931
0.1%
585001
0.1%
ValueCountFrequency (%)
7550001
0.1%
7450001
0.1%
6250001
0.1%
6116571
0.1%
5829331
0.1%
5565811
0.1%
5550001
0.1%
5380001
0.1%
5018371
0.1%
4850001
0.1%

GarageCars
Categorical

HIGH CORRELATION

Distinct5
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size1.8 KiB
2
824 
1
369 
3
181 
0
 
81
4
 
5

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1460
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row3
5th row3

Common Values

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Length

2021-07-30T18:51:23.038544image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-07-30T18:51:23.087414image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring characters

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1460
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common1460
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1460
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Interactions

2021-07-30T18:51:14.865233image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:14.948012image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.024808image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.111575image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.186879image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.509813image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.594586image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.682351image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.772126image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.852901image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:15.929166image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.007954image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.079767image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.152736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.228656image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.308461image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.391279image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.482177image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.564999image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.649828image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.735808image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.820221image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.908729image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:16.997248image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.086019image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.161736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.233060image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.308948image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.378846image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.448659image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.528446image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.612308image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.697151image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.867613image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:17.943928image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.019745image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.092819image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.166647image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.242447image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.318288image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.397077image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.481866image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.563633image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.662368image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.740160image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.826929image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.912976image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:18.997379image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.079139image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.164909image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.248685image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.335452image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.413244image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.492034image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.574923image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.670873image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.763155image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.848925image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:19.929217image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:20.012993image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:20.091835image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:20.172617image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:20.258395image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-07-30T18:51:20.346160image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-07-30T18:51:23.142267image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-07-30T18:51:23.377638image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-07-30T18:51:23.477372image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-07-30T18:51:23.585094image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-07-30T18:51:23.699808image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-07-30T18:51:20.620400image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-07-30T18:51:20.795913image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

YearBuiltNeighborhoodFullBathHouseStyleOverallQualTotalBsmtSF1stFlrSFGrLivAreaTotRmsAbvGrdGarageAreaSalePriceGarageCars
02003CollgCr22Story7856856171085482085002
11976Veenker21Story612621262126264601815002
22001CollgCr22Story7920920178666082235002
31915Crawfor12Story7756961171776421400003
42000NoRidge22Story811451145219898362500003
51993Mitchel11.5Fin5796796136254801430002
62004Somerst21Story816861694169476363070002
71973NWAmes22Story711071107209074842000002
81931OldTown21.5Fin79521022177484681299002
91939BrkSide11.5Unf59911077107752051180001

Last rows

YearBuiltNeighborhoodFullBathHouseStyleOverallQualTotalBsmtSF1stFlrSFGrLivAreaTotRmsAbvGrdGarageAreaSalePriceGarageCars
14501974NAmes22Story58968961792801360000
14512008Somerst21Story815731578157878402870903
14522005Edwards1SLvl55471072107255251450002
14532006Mitchel11Story511401140114060845000
14542004Somerst21Story712211221122164001850002
14551999Gilbert22Story6953953164774601750002
14561978NWAmes21Story615422073207375002100002
14571941Crawfor22Story711521188234092522665001
14581950NAmes11Story510781078107852401421251
14591965Edwards11Story512561256125662761475001